課程資訊
課程名稱
翻譯科技基礎程式撰寫(一)
Python for Translation Technology I 
開課學期
112-1 
授課對象
文學院  翻譯碩士學位學程  
授課教師
徐嘉煜 
課號
GPTI5035 
課程識別碼
147 U0350 
班次
 
學分
3.0 
全/半年
半年 
必/選修
選修 
上課時間
星期三5,6,7(12:20~15:10) 
上課地點
博雅304 
備註
翻譯學程學生優先。不開放旁聽。中英翻譯學程學生請於開學後向教師領取授權碼加選。
限本系所學生(含輔系、雙修生)
總人數上限:16人 
 
課程簡介影片
 
核心能力關聯
核心能力與課程規劃關聯圖
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

Basic computer programming (or “coding”) skills are not (yet) in a professional translator’s standard toolbox, but it’s about time they were! In this text-oriented course, we’ll learn how to use Python, the go-to language for machine learning, data science and artificial intelligence tasks these days, to process Chinese and English texts and integrating the results into our standard translation workflow. Augment your computer know-how in Word, Excel and other apps with this newly acquired coding skills. Take the plunge and be amazed at what you can do with only very limited knowledge of Python!
譯者請注意:在您的專業翻譯工具箱中加入電腦程式設計技能,現在正是時候!本課程側重各種文本與語料處理,採用目前在機器學習、資料科學和人工智慧等領域的首選電腦語言Python來處理中英文本,並將加工後的語料整合到翻譯工作流程中。如果您已熟悉Word、Excel、CAT tools 等應用軟體,懂得Python程式設計將對您的資訊技能有加成作用。加入我們程式設計的行列!只要學會Python最基本的語法和函式庫功能就有意想不到的收穫! 

課程目標
• To know how to write simple Python code to solve a variety of problems related to text processing; 解決各種文本及語料處理相關之難題。
• To learn the basics of Regular Expressions, a powerful pattern-matching mini-language; 學習強大的正規表示式檢索語言。
• To learn how to download web pages from the Internet, analyze the different elements of an HTML document, and extract useful textual data from it; 撰寫網路爬蟲程式下載文本、分析HTML以擷取有用的資料。
• To learn how to “tokenize” Chinese texts (i.e., breaking down a string of Chinese characters [字] into word [詞] tokens, a process otherwise known as Chinese word segmentation [分詞/斷詞]); 中英文分詞
• To learn basic UNIX/Linux/macOS command line tools to process very large text files (millions of lines); 學習在command line (命令提示字元或終端機) 下各種指令處理大量資料 
課程要求
There will be weekly programming exercises and a term project.
每週程式撰寫作業;期末專題報告 
預期每週課後學習時數
5 hrs/week 
Office Hours
另約時間 備註: By appointment 
指定閱讀
• Hunt, J. 2023. A Beginners Guide to Python 3 Programming, 2e. Springer. 
參考書目
• Magali, P and Gries, S. Th. 2020. A Practical Handbook of Corpus Linguistics

• Stephenson, B. 2019. The Python Workbook: A Brief Introduction with Exercises and Solutions, 2nd ed. Springer.
• Milliken, C. 2020. Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming. Springer.
• Sheppard, K. 2021. Python Introduction.
https://www.kevinsheppard.com/teaching/python/course/
(To be updated) 
評量方式
(僅供參考)
 
No.
項目
百分比
說明
1. 
 
15% 
Attendance and class participation 課堂參與  
2. 
 
65% 
Weekly assignments 每週作業 
3. 
 
20% 
Final report and presentation 期末報告(口頭+書面) 
 
課程進度
週次
日期
單元主題
第1週
  Introduction to Python – Setting up a productive coding environment: [Laptop installation] Anaconda (Jupyter, Spyder); [Cloud-based] Amazon SageMaker Studio Lab; Google Colab, Binder + Github
Python 語言概論;各種運算環境建立 
第2週
  Python (the language): Basic data types and data structures: int, float, str, list, tuple, set, dict
基本資料類別與資料結構 
第3週
  Python (the language): Syntax, flow control and looping: if – elif – else; for, while
基本語法、條件運算式、迴圈 
第4週
  Standard and user-contributed Python libraries (PyPI); Roll-your-own (user-defined) functions
標準函式庫、PyPI第三方套件、自訂函式 
第5週
  Manipulating text files: Reading and writing text files; character sets and encoding; Unicode standard; JSON
文本處理與操作、讀寫檔案、字元編碼及標準、統一碼、JSON資料格式 
第6週
  Text tokenization and related tasks:
Word: Chinese word segmentation, English word tokenization
Sentence: sentence splitter;
Conversion between traditional and simplified Chinese characters
中英文分詞、分句作業;正體(傳統)漢字與簡化字互換 
第7週
  Searching text for words and phrases via patterns:
Regular Expressions I; CudaText, Visual Studio Code
正規表示式 (regex) II: 使用各種文字編輯器之 regex 引擎 
第8週
  Regular Expressions II: How to use regexes in Python;
使用regex函式庫 
第9週
  Accessing databases: SQLite3, SQL language; DB Browser for SQLite
資料庫存取;基本SQL 語法;資料存取查詢GUI
Creating TMX (Translation Memory eXchange) files from plain text, Excel and Word files 翻譯記憶標準格式 (TMX);TMX與Word和Excel檔轉換  
第10週
  Tapping into the vast textual data sources on the Internet: Web scraping I and understanding HTML
網路爬蟲程式撰寫I: 下載各種文本;HTML語法 
第11週
  Web scraping II 爬蟲程式撰寫 II 
第12週
  Translation Corpus Construction:
1. Bitext alignment at the sentence level;
2. Bitext alignment at the word level
翻譯語料庫建置:句對齊、詞對齊 
第13週
  Translation Corpus Construction (cont’d):
3. Creating a web interface for a bilingual concordancer.
4. Creating your own off-line dictionaries and terminology look-up systems
翻譯語料庫建置:辭典、術語庫(重編國語辭典、簡明成語典) 
第14週
  Processing data with the command line interface (CLI): Introduction to Bash utilities: grep, cat, cut, awk, sed, wc, wget, and others
使用命令列介面(CLI)指令處理大量資料 
第15週
  Optical Character Recognition (OCR) with open-source tools
光學字元辨識工具 
第16週
  Final report presentation 期末口頭報告